194 research outputs found
Colour videos with depth : acquisition, processing and evaluation
The human visual system lets us perceive the world around us in three dimensions
by integrating evidence from depth cues into a coherent visual model of the world. The equivalent in computer vision and computer graphics are geometric models,
which provide a wealth of information about represented objects, such as depth and
surface normals. Videos do not contain this information, but only provide per-pixel
colour information. In this dissertation, I hence investigate a combination of videos
and geometric models: videos with per-pixel depth (also known as
RGBZ videos).
I consider the full life cycle of these videos: from their acquisition, via filtering and
processing, to stereoscopic display.
I propose two approaches to capture videos with depth. The first is a spatiotemporal
stereo matching approach based on the dual-cross-bilateral grid – a novel real-time
technique derived by accelerating a reformulation of an existing stereo matching
approach. This is the basis for an extension which incorporates temporal evidence in
real time, resulting in increased temporal coherence of disparity maps – particularly
in the presence of image noise.
The second acquisition approach is a sensor fusion system which combines data
from a noisy, low-resolution time-of-flight camera and a high-resolution colour
video camera into a coherent, noise-free video with depth. The system consists
of a three-step pipeline that aligns the video streams, efficiently removes and fills
invalid and noisy geometry, and finally uses a spatiotemporal filter to increase the
spatial resolution of the depth data and strongly reduce depth measurement noise.
I show that these videos with depth empower a range of video processing effects
that are not achievable using colour video alone. These effects critically rely on the
geometric information, like a proposed video relighting technique which requires
high-quality surface normals to produce plausible results. In addition, I demonstrate
enhanced non-photorealistic rendering techniques and the ability to synthesise
stereoscopic videos, which allows these effects to be applied stereoscopically.
These stereoscopic renderings inspired me to study stereoscopic viewing discomfort.
The result of this is a surprisingly simple computational model that predicts the
visual comfort of stereoscopic images. I validated this model using a perceptual
study, which showed that it correlates strongly with human comfort ratings. This
makes it ideal for automatic comfort assessment, without the need for costly and
lengthy perceptual studies
360MonoDepth: High-Resolution 360° Monocular Depth Estimation
360{\deg} cameras can capture complete environments in a single shot, which
makes 360{\deg} imagery alluring in many computer vision tasks. However,
monocular depth estimation remains a challenge for 360{\deg} data, particularly
for high resolutions like 2K (2048x1024) and beyond that are important for
novel-view synthesis and virtual reality applications. Current CNN-based
methods do not support such high resolutions due to limited GPU memory. In this
work, we propose a flexible framework for monocular depth estimation from
high-resolution 360{\deg} images using tangent images. We project the 360{\deg}
input image onto a set of tangent planes that produce perspective views, which
are suitable for the latest, most accurate state-of-the-art perspective
monocular depth estimators. To achieve globally consistent disparity estimates,
we recombine the individual depth estimates using deformable multi-scale
alignment followed by gradient-domain blending. The result is a dense,
high-resolution 360{\deg} depth map with a high level of detail, also for
outdoor scenes which are not supported by existing methods. Our source code and
data are available at https://manurare.github.io/360monodepth/.Comment: CVPR 2022. Project page: https://manurare.github.io/360monodepth
Real-time Global Illumination Decomposition of Videos
We propose the first approach for the decomposition of a monocular color
video into direct and indirect illumination components in real time. We
retrieve, in separate layers, the contribution made to the scene appearance by
the scene reflectance, the light sources and the reflections from various
coherent scene regions to one another. Existing techniques that invert global
light transport require image capture under multiplexed controlled lighting, or
only enable the decomposition of a single image at slow off-line frame rates.
In contrast, our approach works for regular videos and produces temporally
coherent decomposition layers at real-time frame rates. At the core of our
approach are several sparsity priors that enable the estimation of the
per-pixel direct and indirect illumination layers based on a small set of
jointly estimated base reflectance colors. The resulting variational
decomposition problem uses a new formulation based on sparse and dense sets of
non-linear equations that we solve efficiently using a novel alternating
data-parallel optimization strategy. We evaluate our approach qualitatively and
quantitatively, and show improvements over the state of the art in this field,
in both quality and runtime. In addition, we demonstrate various real-time
appearance editing applications for videos with consistent illumination
- …